2.2 Main report
2.2.1 Overview of PhD
I am on a 3.5-year GW4 BioMed MRC DTP PhD. I am in my third year and expect to finish April 2021. My maximum submission date is 03/10/2021. The year 1 report and presentation can be downloaded from GitHub.
2.2.1.1 Rationale
The number of individuals suffering from overweight and obesity is at an all time high. Globally, 39% and 13% of adults (18+) are estimated to be overweight or obese1(Figure 1 and 2) and this number is expected to continue to rise2–4. It is estimated that obesity is responsible for 8% of global deaths5(Figure 3). With the number of overweight and obese individuals increasing2–4 it is likely the number of premature deaths will rise too.
Figure 1: Proportion of overweight individuals
Figure 1, reproduced from Ritchie and Roser (2019)6, shows the share of adults (18+) that are overweight globally and in 5 selected geographic regions (Americas, Europe, Eastern Mediterranean, Africa and South East Asia) from 1975 to 2016.
Figure 2: Proportion of obese individuals
Figure 2, reproduced from Ritchie and Roser (2019)6, shows the share of adults (18+) that are obese globally and in 5 selected geographic regions (Americas, Europe, Eastern Mediterranean, Africa and South East Asia) from 1975 to 2016.
Figure 3: Number of deaths by risk factor
Figure 3, reproduced from Ritchie and Roser (2019)6, shows the number of deaths for 26 risk factors globally in 2017 for all age groups. Obesity is the 5th leading cause of death with 3.41 million deaths in 2017.
Conventionally, overweight and obesity is measured using body mass index (BMI), with overweight and obesity classified as a BMI of 25–29.9 kg/m2 and > 30 kg/m2 respectively. A normal weight classification is a BMI of 18.5–24.9 kg/m2, with an underweight class below this. In the crudest sense, BMI is a measure of weight given an adjustment of height. BMI is associated with numerous diseases and provides an accurate measure of risk at a population level for many. However, BMI does not have the resolution to accurately measure an individual’s body composition7–10 i.e. the amount and location of adipose tissue within the body - studies have pointed to a more important role for fat deposition in disease development11,12. As such, complimentary assessment of increased adiposity using a combination of body composition measures (i.e. BMI, waist hip ratio, body fat %) may provide additional information into associations with disease13,14.
Adipose tissues are prolific signalers to surrounding and systemic tissues15,16 leading to large downstream effects with potentially harmful consequences16–19. Changes to adipose tissue abundance is reflected in adipocyte signaling. This change is concurrent with shifts in metabolic profiles, where alterations to the level of one metabolite does not occur in isolation. Metabolites sit at the interface between genetic and non-genetic factors, provide a useful read-out of physiological function, and a number of GWASs have now been performed which have identified many strongly associated SNPs which, in some cases, explain large proportions of variance20–22. The number of available GWASs is increasing and with ever more studies undertaking metabolomics work and many already possessing genetic information on participants it is likely that much larger metabolomics GWAS studies will soon be available to the community.
2.2.1.2 Aims
The biological pathway from increased adiposity to disease development is unclear. Adipose tissue is a prolific signaling organ resulting in systemic changes across the body. Metabolic changes may be a result of increased adiposity and subsequent signaling and evidence has highlighted the role of metabolites in disease. The aim of this thesis is to:
- Identify metabolites that sit on the causal pathway from increased adiposity to disease
2.2.1.3 Objectives
In order to achieve this aim and better understand the biological mechanisms underlying disease development this thesis will:
- Identify all traits causally associated with increased adiposity
- I will perform a systematic review of all Mendelian randomization studies investigating measures of increased adiposity with any outcome
- I will use the findings from this review to guide the outcomes of interest for this thesis
- Identify and describe appropriate instrumentation of increased adiposity
- Identify metabolites associated with increased adiposity
- I will use observational and Mendelian randomization analyses to identify metabolites associated with multiple measures of increased adiposity
- I will use observational and Mendelian randomization analyses to identify metabolites associated with multiple measures of increased adiposity
- Design and implement methods to cluster metabolites
- Metabolites are complicated and highly correlated; I will compare methods to cluster metabolites and propose rules for instrumenting metabolites and clusters for Mendelian randomization analyses
- Metabolites are complicated and highly correlated; I will compare methods to cluster metabolites and propose rules for instrumenting metabolites and clusters for Mendelian randomization analyses
- Identify diseases associated with metabolites
- We will use observational and Mendelian randomization analyses to identify metabolites associated with diseases
- We will be guided by the systematic review and metabolites we identify as associated with increased adiposity
2.2.1.4 Layout
Figure 4: Overview of PhD chapters
Figure 4 shows an overview of proposed chapters for the thesis, including progress to date and expected outcomes, in order to achieve the described aim and objectives.
2.2.2 Chapter progress
The thesis is laid out as a pipeline check-list of what to do when researchers want to understand the causal associations between exposures and outcomes using metabolites as intermediates. Chapter 1 introduces the context of the thesis and what we currently know about increased adiposity and diseases. The pipeline starts with chapter 2, identification of diseases associated with increased adiposity, and progresses through choosing instruments for exposures, performing observational analysis and the first step of an MR, visualising the results, instrumenting metabolites as intermediates and performing the final MR stage of intermediate to disease. The below Gantt chart lays out the plan for chapter progress in the coming 12 months:
2.2.2.1 Chapter 1: Introduction
2.2.2.1.1 Overview
This chapter provides the context of the thesis, i.e. what diseases increased adiposity is associated with. It provides background on adipose tissue and the products of adipose tissue such as metabolites. It gives an overview of observational research looking at diseases associated with increased adiposity. It goes on to explore what metabolites might provide in understanding these associations and how MR may help investigate these associations. This chapter includes the aims and objectives of the thesis which are described above.
2.2.2.2 Chapter 2: Systematic review
2.2.2.2.1 Overview
Chapter 1 shows that the literature is clear that numerous diseases are associated with increased adiposity. However, the causal associations between increased adiposity is not as clear. As MR has been increasingly used over the years and more datasets have become available a large body of evidence has built up for causal associations between increased adiposity and a number of diseases. Chapter 2 sets out to synthesise all of this evidence and identify the diseases causally associated with increased adiposity. These diseases will be used in the second step of the MR to identify whether metabolites are associated with the diseases (Chapter 9). The systematic review will include a meta-analysis, however time constraints of the PhD may mean this is not completed within the time frame.
2.2.2.2.2 Progress
~150 papers were identified and included for data extraction. Data extraction is on-going and expected to be completed end of February with a draft manuscript/chapter for end March.
Data on ~500 MR analyses have been extracted from 55 papers. With roughly 100 papers left to extract data on, the number of MR analyses which we will have data on could be more than 1,000. This poses the possibility of being able to meta-anlyse a large number of exposur-outcome analyses. However, from the data that I have gathered so far I think the possibility of performing many (if any) meta-anlyses is unlikely. This is based on there being few studies so far with data available which are independent of another data set - i.e. two MR studies which look at BMI and lung cancer: One from 2017 and one from 2018, the 2018 study includes the data used in the 2017 study plus extra (the 2018 paper is essentially an update of the 2017 paper with more cases). The second reason I don’t think it will be possible is because of the differning methods used in the MR analysis - I’m not sure whether it’s appropriate to meta-analyse an IVW estimate and a likelihood method for example. Finally, though there are likely to be more reasons why meta-analysis is not appropriate for particular MR analses, is the absence of data. Thus far, few studies have reported enough, and in some cases accurate, information on the data they have used to perfomr the MR analysis. For example, studies will state that they have used the European BMI SNPs from the GIANT consortium published in 2015 (Locke et al. 2015) but will then quote the number of SNPs from the all ancestries GWAS from the Locke paper. This could be checked easily if they provided the list of SNPs used and the betas for those SNPs, however in many instances they do not. It is therefore difficult to say whether they used the Euorpean SNPs and just quoted the wrong number or if they used the all ancestries SNPs and quoted the wrong population. These examples make meta-analysis difficult but I think important to focus on in the chapter and paper.
Data extraction work includes a number of other colleagues and I have been really poor at managing this aspect of the project. Having had a discussion with Kaitlin about it I hope to now have a better handle on the management of the data extarction and the progress of this going forward.
2.2.2.3 Chapter 3: Instrumentation
2.2.2.3.1 Overview
Before starting an analysis one must first identify the exposure. In MR analysis identifying the exposure includes deciding how to instrument the exposure. Traditionally this has been to select independent genetic variants reaching a genome-wide significance threshold (5 x 10-8) from the largest available GWAS. For increased adiposity measures, especially BMI, there are now many GWASs available for researchers to choose from. Chapter 3 explores how to instrument increased adiposity, including investigating the relationship between the exposures and the different GWASs available.
2.2.2.3.2 Progress
This will be a short chapter with a small amount of analysis showing the appropriateness of the instruments selected for the MR analysis. Some of the analysis has been conducted and some of the chapter has been written. An unformatted draft can be viewed on GitHub.
Firstly I have looked at the correlation between the three measures of increased adiposity. Figure 5 shows the correlation analysis in ALSPAC men (red) and women (grey) for BMI and WHR (left), BMI and BF% (middle) and BF% and WHR (right). This shows that sexual dimorphism is strong in BF% and WHR, suggesting sex plays an important role in the deposition of fat around the body and should be considered when instrumenting this phenotype. The next steps will be to investiagte the associations of each traits instruments with each of the measured phenotypes in ALSPAC followed by investiagting associations with potential measured confounders in ALSPAC. I will need to repeat these analyses in a second cohort (FGFP/INTERVAL/Biobank). This chapter will be small and as the plan for the thesis is to provide a pipeline investiagting potential time specific instruments will be briefly discussed as a consideration but no analysis will be performed.
Figure 5: Scater plots of ALSPAC individuals data on measures of increased adiposity
Figure 6: F statistics for 4 meausres of adiposity (BF%, BMI, WHR, WHRadjBMI) across multiple SNP sets. Mean given as black diamond and blue line indicating an F statistic of 10
2.2.2.4 Chapter 4: Observational analysis
2.2.2.4.1 Overview
Having established how to instrument increased adiposity in observational and MR analyses in Chapter 3, this chapter explores the observational associations of increased adiposity and metabolites. The chapter will focus on observational associations and confounders. A number of studies have investiagted the observational associations of BMI and metabolites23–25 but I have not found many studies looking at BF%23 or WHR25, and none looking at all three measures in a single cohort. If there are similarly measured metabolites in the discussed studies it may be possible to compare estimates within this chapter as well especially in regards the study by Murphy et al. 201723 whcih looked at BF% in black men.
2.2.2.4.2 Progress
Not started. This analysis will be performed in ALSPAC and replication in one of FGFP, INTERVAL or Biobank (we should have access to the metabolomics data for Biobank in the coming months).
2.2.2.5 Chapter 5: MR step 1
2.2.2.5.1 Overview
This chapter is the first step of the MR process in identifying intermediate metabolites. The main analysis includes 3 adiposity exposures and 123 metabolites derived using NMR from Kettunen et al (2016)22. Additional sensitivity analysis of 15 other measures of adiposity and 4 methods has also been performed. The three exposures, BMI, BF% and WHR were selected as measures of adiposity as they are the most often used measures of adiposity by reserahcers and clinicians. Figure 7 shows the effect estimates of the main analysis using the IVW multiplicative random effects model (IVW-MPE) across the three exposures and all 123 metabolites.
The 17 additional measures of adiposity are different instrument sets (instruments obtained from different GWASs) for BMI, BF% and WHR as well as WHR adjusted for BMI. From the systematic review I have found that, particularly in the case of BMI, the number of SNPs used vaires considerably among studies. This variability may influence the results, especially where comparing a study using 10 SNPs as a BMI instrument and one using 941 SNPs. As such, including different instrument sets will allow me to look at potential limitations of using smaller or larger SNP lists as instruments. Further more, as my main analysis is using BMI SNPs identified in a meta-analysis with UK Biobank included it is important to have a comparasion as population structure within UK Biobank26 may imapct analyses.
The total number of tests performed in the main analysis (BMI, BF%, WHR) using an IVW-MPE model is 369. Including sensitivity analysis methods (MR Egger, weighted median, weighted mode) the tests equal 1476. The total number of tests across all exposures (3 main and 15 sensitivity) and all 4 methods is 8856. In addition to this I have performed the same analysis with an additional metabolite data set of 452 metabolites20 (3.2544^{4} total tests) - I think this second analysis will probably not be included in the thesis as the metabolite GWAS is not as well powered or clean as the kettunen metabolite GWAS. The total number of tests across both metabolite data sets is 4.14^{4}.
This work is similar to a paper by Peter Wurtz in 201427 whcih looked at the causal (alos performed observational analysis) associations between BMI and 82 metabolic signatures (including systolic and diastolic blood pressure) in 12,664 individuals of Finnish ancestry. This analysis provides a comprehenisve update of the previous work, incorporating three measures of adiposity, ~40 additional metabolites in the main analysis and a secondary analysis including 452 metabolites measured using a differnet metabolomics platform. The additional steps of exploring instrumentation of metabolites and taking these out to disease is a key difference between the two works.
There is scope here to replicate this analysis using a second metabolite data set from INTERVAL or from a soon(?) to be released metabolite GWAS from Claudia Langenbergs group in Cambridge.
Figure 7: Circos plot of two-sample MR analysis using an IVW-MPE model showing effect estimates for BMI, WHR and BF% with 123 NMR derived metabolites grouped by super pathway
2.2.2.5.2 Progress
I am 2/3 of the way through the manuscript which has been written as if i was writing the chapter. I need to finish the manuscript and transfer this into the chapter and then cut the manuscript down to form a publishable document. This project is my first in attempting to be completely reproducible with my code and is laid out in full on GitHub (currently private).
2.2.2.6 Chapter 6: MR Viz
2.2.2.6.1 Overview
Having performed a large MR analysis of 3 exposures and 123 outcomes (369 tests) as the main analysis, plus sensitivity analysis for 3 methods (1476) and an additional 15 measures of adiposity (1845) each with 3 additional methods (7380) the total number of tests is 8856. It is difficult to visualise and interpret all of this data. Given that we want to look at the global profile of metabolite changes as a result of increased adiposity we need to be able to visualise this data in an interpretable manner. This chapter demonstrates a web application and R package developed to create Circos plots to visualise these types of MR analyses. The tool aids interpretation of large studies like this as it enables reserachers to gain global overview of their analyses very quickly as well as allow them to compare, for example differences between exposures and a multiple groups of metabolits. Because the human eye is adept at pattern recognition, visualisations such as this that provide global overview will allow effiecint identification of sim/dissimilarities between exposures/outcomes that can then be followed up.
2.2.2.6.2 Progress
I am 2/3 of the way through the manuscript. The manuscript and the GitHub page will be adapted to form the chapter so 2/3 of the chapter is complete essentially. The web application is in a beta stage and is useable - I need to do some focus-group work with the group to make the website user friendly and incorporate any additional features/wording they think is needed. The R package is available on GitHub and is in the final stage with testing needing doing. All of this, including the manuscript, should be finished within the next two months, I just need to get some people together to play around with the app and R package to make sure it works and doesn’t break. An example plot made from the R packages can be seen below along with screenshots of the application.
Figure 8: Circos plot produced using R package
Figure 9: Home page of the MR Viz web app
Figure 10: Upload and check data page of the MR Viz app
Figure 11: Circos plot creation page of the MR Viz web app
2.2.2.7 Chapter 7: Clustering metabolites
2.2.2.7.1 Overview
Having identified metabolites associated with increased adiposity from visualising the global profile we need to decide how to instrument them in the second step of the MR analysis. There are in essence two ways to do this, either use each metabolite individually as one would normally or combine metabolites into a group that one then instruments. In chapter 9 I will use both individual metabolites and groups. In this chapter I will compare a number of different methods for clustering metabolites into groups that I can then instrument. Once clustered the group of metabolites would then be treated as a single exposure in an MR analysis and the genetic variants for each metabolite in the group would be used to instrument the exposure. It is likely that there will be shared genetic variants within these groups. For instances like this rules will need to be set in order to identify which beta to use for example. These rules will be discussed in chapter 8.
This chapter can follow one of two ways for clustering. Firtsly I can start with the entire metabolite data set of 123 metabolites and then perform clustering analyses to identify groups. The other is to narrow down the number of metabolites before this step, for example by setting a p-value threshold for associations in the MR analysis, and use these metabolites to cluster.
2.2.2.7.2 Progress
Not started. Below are a list of different clustering methods that I could look at. I think the best approach is to take 3/4 methods to perform analysis with and then briefly comment on 2/3 other methods. The class and subclass groupings, whcih can be easily implemented (this information is provided with metabolomics data), will be used as the baseline. I think this is still open to discussion but self organising maps, ontology and factor analysis are my preferred methods to explore. Self organising maps are artifical neural networks which at small scales function similalry to k-means clustering. Ontology in this instance is to turn each metabolite name into a vector and identify the distance between each vector using semantic data bases. Factor analysis is similar to principal component analysis (PCA) but its objective is to identify the latent variables in this instance the groups of metabolites.
The workflow for this chapter would consist of two parallel analyses, one using the metabolite data used in chapter 5 and the other being simulated data in which the number of underlying groups within the data is known.
- Priors
- class
- subclass
- biological pathway
- size
- shared genetic variants
- No priors
- PCA
- factor analysis
- Hierarchical clustering
- density clustering
- self organising map
- LDSR
- ontology
- have discussed with Ben Elsworth - a pipeline is set-up that can be adapted to implement this and would be an interesting case study for their paper.
2.2.2.8 Chapter 8: Instrumentation
2.2.2.8.1 Overview
I have split this as a seperate chapter for ease of organising the thesis, but most likely this will be combined with chapter 7.
Having now explored the different methods for clustering the metabolites I need to establish the rules for how to then instrument these different types of clusters - for example if a cluster has two of the same SNP which beta for that SNP do you use? This chapter will layout a set of rules for instrumenting metabolites in MR analysis. One of the ways in which to workout what these rules should be is to discuss with researchers using this type of data and performing similar analyses.
2.2.2.8.2 Progress
Not started.
2.2.2.9 Chapter 9: MR step 2
2.2.2.9.1 Overview
Having now identified metabolites associated with increased adiposity, metabolite clusters, and how to instrument clusters we can perform the second step of the MR investigating metabolite associations with diseases. We will use MR Base to obtain outcomes and will select these outcomes based on results from the systematic review (chapter 2).
It will be possible to run the MR against all diseases in MR Base and if there is time this could be provided as a searchable database for researchers to use.
2.2.2.9.2 Progress
Working with Ben Elsworth to categorise all MR Base GWASs into categories for easy subsetting to perform analysis on for example all anthropometric traits with all smoking traits. I have the ground work for the code completed for this analysis and will test using a few metabolites and a few diseases. The code for this test MR is scalable and so once Chapter7/8 are complete the analysis should be finished relatively quickly.
2.2.2.10 Chapter 10: Discussion/limitations/conclusion
2.2.2.10.1 Overview
This chapter will tie everything together and present a diagram that outlines the pipeline for performing MR analysis of this type.
2.2.2.10.2 Progress
Not started.
2.2.3 Other
2.2.3.1 Courses
This figure shows the courses I have been on this year
I am booked onto the Economic Evaluation Bristol Medical Short Course. The course, though focussed on economics, is about prediction. It would be interesting to see the use of metabolomics as predictors with or in comparison to adiposity measures for different diseases and this course would provide the skills needed to perform the analyses and understand the outputs. The course is 3 days (22-24 June).
I hope to do a placement at Mount Sinai with Ruth Loos as discussed in detail in the Other work section below.
2.2.3.2 Conferences/ presentations
I have presented my work at the following:
- Faculty of Health Sciences research showcase, Bristol, United Kingdom; presentation - Metabolite profiles as markers of risk
- Faculty of Health Sciences research showcase, Bristol, United Kingdom; poster - Metabolite profiling of multiple measures of adiposity: A Mendelian randomization analysis
- Metabolomics 2019, The Hague, The Netherlands; poster - Metabolite profiling of multiple measures of adiposity: A Mendelian randomization analysis
- MR conference 2019, Bristol, United Kingdom; poster - MR-Vis: A tool for the visualisation of high-dimensional Mendelian randomization results
2.2.3.3 Teaching
I have taught on the following:
- Mendelian randomization, Bristol Medical School short course
- Tutor helping participants create an MR study design they present to the class
- Tutor helping participants on two practicals: data harmonization and two-sample MR
- Mendelian randomization conference MR course, conference workshop
- Lecture on two-sample MR
- Introduction to R, Bristol Medical School short course
- Tutor helping participants on a data manipulation practical
- Introduction to data visualisation and web applications using R, Bristol Medical School short course
- Tutor helping on practicals on data visualisations and Rmarkdown
- A one week course at the University of Pavia: Causal Inference and Mendelian randomization, Department of Brain and Behavioural Sciences, University of Pavia, Italy
- Kaitlin Wade (lead/organiser) and I taught a condensed version of the MR short course to ~20 participants
The teaching I have done has really helped develop my communication skills for academic audiences. The teaching has really helped improve my understanding of MR in particular as I have been required to explain to, and answer questions from, people with different academic backgrounds. I really enjoy teaching, and want to continue in the future with some of it, but time wise I think I will do less in my third year because of the need to focus on my thesis.
2.2.3.4 Public engagement
Public engagement is a passion of mine and I did quite a lot during my first year. I decided to cut down a lot during my second year and again in this third year have decided to do less. In my second year I was involved in the following public engagement activities:
- Creative Reactions, lead - 50 artists and 50 researchers with > 5,000 visitors
- Creative Reactions, participated - research turned into an artwork
- Talks - I have given a number of talks to the public
- MRC IEU at Greenman festival - research stand showcasing work of the IEU
- ~£20,000 in grants awarded in review period (including £14,985 from the EPSRC) for Creative Reactions
- Nic and I are writing an application to the Wellcome Trust to fund a public engagement project for the research group for ~£50,000 - the grant is to fund an artist to work with the group to communicate the groups research to target audiences. Administrative costs have been included for an administrator to run the project; my role will be to oversee the project.
- I am working with a research fellow in Maths on a bid to Arts Council England for ~£50,000 - supported by Head of the School of Arts and Maths and PHSI (Caroline) - that will match fund the Wellcome Trust bid above and provide funds for two artists. Administrative costs have been included for an administrator to run the project; my role will be to oversee the project.
2.2.4 Other work
2.2.4.1 Placement
Funds have been requested (to extend the PhD by three months) to enable Matthew to work with Professor Ruth Loos at the Ichan School of Medicine at Mount Sinai, New York and undergo training in computational and data analytics to characterise the genetics of body composition, which will enable him to expand his current analysis investigating adiposity -> metabolites -> diseases to more accurate measures of body composition.
Matthew has investigated the effects of BMI and WHR but the lack of strong genetic characterisation for BF limits the understanding we can gain from these analyses. Matthew has also been using combinations of measures (profiles) to investigate the underlying biological mechanisms driving associations and is developing ways to investigate metabolites as profiles of risk. Combining the skills, expertise and data of our and Professor Loos’ groups will enable Matthew to expand his work into BF, and thus gain a better understanding of the biological mechanisms driving associations between increased adiposity and disease.
Matthew will work with Professor Loos, a world leader in human genetics and body composition, to explore BF genetics and increased adiposity profiles. Professor Loos is a member of the steering committee for the global BMI genome-wide association study (GWAS) consortia and set-up the global body composition GWAS consortia. The Loos Lab are an interdisciplinary team aiming to identify and characterize genes to better understand biological pathways. The team includes JJ Wang, Post-doctoral Fellow, has expertise in integrating genomics and metabolomics data; Arden Moscati, a Computational Geneticist, has expertise in aetiological overlap between traits and diseases; Daiane Hemerich, Post-doctoral Fellow, has expertise in fine-mapping and functional annotation. Much of the labs work is conducted in BioMe, a biobank of ~50,000 ancestrally diverse (European (32%), African (24%), Hispanic/Latino (35%) and other/mixed ancestries (9%)) individuals.
Matthew will contribute to characterizing the genetics of BF using whole genome and exome sequence data. This will involve the use of fine-mapping, co-localisation and integrated approaches to investigate the genetics of multiple measures of body composition. This placement, discussed and agreed with Professor Loos, is a natural and valuable progression of Matthew’s work to develop our understanding of the mechanisms driving disease risk. With UK Biobank due to release metabolite data at a similar time, the placement will provide the appropriate data and skills to (i) identify genetic variants for MR investigations of metabolites and diseases and (ii) to integrate these genetic variants with the metabolomics work already completed and to be performed in UK Biobank.
2.2.4.2 GWAS of glycosuria
We conducted a genome-wide association study (GWAS) of glycosuria (sugar in urine) in pregnant mothers from the Avon Longitudinal Study of Parents and Children (ALSPAC). Due to a lack of available external data sources replication was not possible, instead we performed a GWAS in the Northern Finland Birth Cohort 1986 (NFBC1986) where we used mothers phenotype and the mothers’ offspring’s genotype. To estimate the maternal effects from offspring genotypes we doubled the effect estimates and standard errors of the GWAS results28–30. The GitHub repository provides all scripts and data. We are currently making revisions to reviewer comments.
2.2.4.2.0.1 Abstract
Glycosuria is a condition where glucose is detected in urine at higher concentrations than normal. Glycosuria at some point during pregnancy has an estimated prevalence of 50% and is associated with adverse outcomes in both mothers and offspring. Little is currently known about the genetic contribution to this trait or the extent to which it overlaps with other seemingly related traits, e.g. diabetes. We performed a genome-wide association study (GWAS) for self-reported glycosuria in pregnant mothers from the Avon Longitudinal Study of Parents and Children (ALSPAC; cases/controls=1,249/5,140). We identified two loci, one of which (lead SNP=rs13337037; chromosome 16; odds ratio (OR) of glycosuria per effect allele: 1.42; 95%CI: 1.30,1.56; P=1.97x10-13) was then validated using an obstetric-measure of glycosuria measured in the same cohort (227/6,639). We performed a secondary GWAS in the 1986 Northern Finland Birth Cohort (NFBC1986; 747/2,991) using midwife-reported glycosuria and offspring genotype as a proxy for maternal genotype. The equivalent effect estimate for rs13337037 in this cohort was OR 1.57 (95% CI: 1.30,1.83; P=9.8x10-4). In follow-up analyses, we saw little evidence of shared genetic underpinnings with the exception of urinary albumin-to-creatinine ratio (Rg=0.64; SE=0.22; P=0.0042), a biomarker of kidney disease. In conclusion, we identified a genetic association with self-reported glycosuria during pregnancy, with the lead SNP located 15kB upstream of SLC5A2, a target of anti-diabetic drugs. The lack of strong genetic correlation with seemingly related traits such as type 2 diabetes suggests different genetic risk factors exist for glycosuria during pregnancy.
Figure 12: Manhattan plot of a GWAS of glycosuria in ALSPAC mothers